Direct Approximation of Quadratic Mutual Information and Its Application to Dependence-Maximization Clustering

نویسندگان

  • Janya Sainui
  • Masashi Sugiyama
چکیده

Mutual information (MI) is a standard measure of statistical dependence of random variables. However, due to the log function and the ratio of probability densities included in MI, it is sensitive to outliers. On the other hand, the L2-distance variant of MI called quadratic MI (QMI) tends to be robust against outliers because QMI is just the integral of the squared difference between the joint density and the product of marginals. In this paper, we propose a kernel least-squares QMI estimator called least-squares QMI (LSQMI) that directly estimates the density difference without estimating each density. A notable advantage of LSQMI is that its solution can be analytically and efficiently computed just by solving a system of linear equations. We then apply LSQMI to dependence-maximization clustering, and demonstrate its usefulness experimentally.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dependence-Maximization Clustering with Least-Squares Mutual Information

Recently, statistical dependence measures such as mutual information and kernelized covariance have been successfully applied to clustering, called dependencemaximization clustering. In this paper, we propose a novel dependencemaximization clustering method based on an estimator of a squared-loss variant of mutual information called least-squares mutual information. A notable advantage of the p...

متن کامل

Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information

Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...

متن کامل

Informational Energy Kernel for LVQ

We describe a kernel method which uses the maximization of Onicescu’s informational energy as a criteria for computing the relevances of input features. This adaptive relevance determination is used in combination with the neural-gas and the generalized relevance LVQ algorithms. Our quadratic optimization function, as an L type method, leads to linear gradient and thus easier computation. We ob...

متن کامل

Information-Maximization Clustering Based on Squared-Loss Mutual Information

Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it involves only continuous optimization of model parameters, which is substantially simpler than discrete optimization of cluster assignments. However, existing metho...

متن کامل

Equitability Analysis of the Maximal Information Coefficient, with Comparisons

A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable stati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 96-D  شماره 

صفحات  -

تاریخ انتشار 2013